Unsupervised estimation of the language model scaling factor

نویسندگان

Christopher M. White

Ariya Rastrow

Sanjeev Khudanpur

Frederick Jelinek

چکیده

This paper addresses the adjustment of the language model (LM) scaling factor of an automatic speech recognition (ASR) system for a new domain using only un-transcribed speech. The main idea is to replace the (unavailable) reference transcript with an automatic transcript generated by an independent ASR system, and adjust parameters using this sloppy reference. It is shown that despite its fairly high error rate (ca. 35%), choosing the scaling factor to minimize disagreement with the erroneous transcripts is still an effective recipe for model selection. This effectiveness is demonstrated by adjusting an ASR system trained on Broadcast News to transcribe the MIT Lectures corpus. An ASR system for telephone speech produces the sloppy reference, and optimizing towards it yields a nearly optimal LM scaling factor for the MIT Lectures corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attitudes towards English as an International Language (EIL) in Iran: Development and Validation of a New Model and Questionnaire

This study aimed at developing and validating a new model and instrument to explore attitudes of Iranian EFL learners towards English as an International Language (EIL). In so doing, the researchers followed several rigorous steps including extensive literature review, content selection, item generation, designing the rating scales and personal information part, Delphi technique, item revision,...

متن کامل

Prostate Helical Tomotherapy: A semi-empirical estimation of the scaling factor based on 2D approximating field

Background: In Helical Tomotherapy (HT), the scaling factor (SF) is the time in seconds that each leaf viewing a target would need to be open to deliver the prescribed dose. The SF is patient-specific and is used to calculate the rotational period of the gantry, and the total treatment time (TTT) of the HT. The SF is generally difficult to estimate. Currently, it takes about one hour t...

متن کامل

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Unsupervised estimation of the language model scaling factor

نویسندگان

چکیده

منابع مشابه

Attitudes towards English as an International Language (EIL) in Iran: Development and Validation of a New Model and Questionnaire

Prostate Helical Tomotherapy: A semi-empirical estimation of the scaling factor based on 2D approximating field

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

عنوان ژورنال:

اشتراک گذاری